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(54) Processing of textual Information and automated apprehension of Information 

(57) Scheme for the automated apprehension of 
textual information conveyed in an input string. The 
input string is segmented to generate segments and/or 
semantical units. The following steps are repeated for 
each segment in the input string until a subset for each 
segment in said input string is identified: 



a. identifying a matching semantical unit in a fractal 
hierarchical knowledge database of semantical 
units and pointers, said matching semantical units 
being deemed to be related to a segment of said 
input string, 

b. determining the fitness of said matching seman- 
tical unit by taking into consideration said semanti- 
cal unit's associations. 
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c. defining a subset of information related to said 
matching semantical unit within said fractal hierar- 
chical knowledge database. 
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Then these subsets are combined to form a result- 
ing semantic network. 
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Description 

TECHNICAL FIELD 

5 [0001] The invention concerns the automated apprehension of textual information conveyed in an input string, the 
structure of a special knowledge database used in this context, and systems based thereon. 

BACKGROUND OF THE INVENTION 

10 [0002] In their attempts at automated apprehension of the meaning of speech and text, neither linguists nor computer 
scientists have made much progress. In our opinion they have concentrated too much on the logical structure of the 
texts themselves and have neglected the structure of the world. Speech and textual information are obviously based on 
the structure of the world and refer to it. 

[0003] Quite some progress has been made in phonological and/or phonetical, lexical, morphological, and syntactical 
15 analyses of natural language processing. However, when it comes to understanding the meaning of speech, i.e. the 
semantical interpretation of speech, the breakthrough has not yet been achieved. As a consequence, the pragmatical 
analysis, the control of tools and devices by natural speech, has also not been developed very far. 
[0004] A typical example of a modern speech/text recognition system is described in the article "Enabling agents to 
work together", by R.V. Quha et a!., Communications of the ACM, Vol. 37, No. 7, July 1994, pp. 127-142, and reviewed 
20 by TJ, Schult in the German article "Transparente Trivialitaten; Cyc-Wissensbasis in WWW'. c1, 1996, Vol. 1 0, pp. 1 1 8- 
121. The Cyc-system described by R.V. Guha is a knowledge based system for true/false categorization of input state- 
ments. TJ. Schult points out in his article that the knowledge representation in the database used in the Cyc-system Is 
not standardized and uses only the following relations for deduction: 'is element of', 1s subset of, and 'has subsets'. 
[0005] It is an object of the present invention to provide a new structure for the representation of knowledge in a data- 
25 base. 

[0006] It is an object of the present invention to provide a new structure for the representation of knowledge in a data- 
base that allows for ease of knowledge increase ("learning") and/or ease of knowledge retrieval ("understanding"). 
[0007] It is another object of the present invention to provide systems based on a new structure for the representation 
of knowledge in a database. 

30 

SUMMARY OF THE INVENTION 

[0008] The present Invention concerns the processing of textual information conveyed in an input string together with 
information contained in a knowledge database. The knowledge database represents a network of hierarchically 
35 arranged semantical units which are similar across hierarchies. According to the present invention, the input string Is 
segmented into segments. These segments are then combined with semantical units from said knowledge database to 
generate a resulting semantic network. This resulting semantic network comprises hierarchically arranged semantical 
units which are similar across hierarchies. 

[0009] The present invention further concerns a specific fractal hierarchical knowledge database for use in connection 
40 with the automated apprehension of information and an apparatus for the processing of textual information conveyed in 
an input string together with information contained in a knowledge database. 

[0010] Furthermore, the present invention concerns the processing of textual information conveyed in an input sting 
together with information contained in a knowledge database for the automated apprehension of information. 
[001 1 ] Advantages of the present invention are addressed in connection with the detailed description or are apparent 
45 from the description. 

DESCRIPTION OF THE DRAWINGS 

[0012] The invention is described in detail below with reference to the following schematic drawings. 

50 

RG. 1 shows the elements (semantical units and pointers) of a fractal hierarchical knowledge database. In 

accordance with the present invention. Note that a pointer can be a semantical unit, in which case the 
semantical unit Is drawn on top of the pointer. 

55 FIG. 2A is a schematic block diagram of a first embodiment, in accordance with the present invention. 

RG. 2B Is a schematic block diagram of a second embodiment, in accordance with the present Invention. 
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FIG. 2C is a schematic block diagram of a third embodiment, In accordance with the present invention. 



FIG. 3A-3C illustrate how an input string can be transformed into an input n twork, in accordance with a first embod- 
iment of the present invention. 

6 

FIG. 4 illustrates a fractal hierarchical knowledge database, in accordance with the present invention. 



FIG. 5 illustrates inherited attributes and relations of 'plants', in accordance with the present Invention. 

10 FIG. 6 illustrates a local network around 'plant^', in accordance with the present invention. 

FIG. 7 illustrates inherited attributes and relations of 'plant2', in accordance with the present invention. 

FIG. 8 illustrates a local network around 'plant2'. in accordance with the present invention. 

15 

FIG. 9 illustrates inherited attributes and relations of 'meadow', in accordance with the present invention. 

FIG. 10 illustrates a local network around 'meadow', in accordance with the present invention. 

20 FIG. 1 1 illustrates a resulting semantic network of sentence 1 , in accordance with the present invention. 



DESCRIPTION OF PREFERRED EMBODIMENTS: 



[GDI 3] In the following, the basic concept of the present invention is described. Before addressing different embodi- 
es ments, the relevant terms and expressions are defined and explained. 

[001 4] The words "interpretation" and "apprehension" are herein used to describe a process which starts with an input 
string, e.g. some sentences and/or questions, and analyzes the textual information (also referred to as original informa- 
tion) conveyed by, or carried in this string and creates an appropriate output, such as a summary, an answer, a question, 
a speech, or a action/reaction. The present inventions achieves this by converting the input string into segments and/or 
30 semantical units. Then, the segments or semantical units are related with (corresponding) information in a knowledge 
database yielding a resulting semantic network that is a representation of the input information. By inverting the result- 
ing semantic network one can create a human-understandat)le output. The inventive approach allows for an automated 
apprehension of the meaning and/or the information conveyed In an input string and possibly for an appropriate answer 
and/or reaction. 

35 [001 5] TTie expression "textual Information'* is defined to be any kind of written information, or information contained 
in speech. "Textual information" is not limited to human-readable or human-understandable. This expression is also 
meant to cover program strings, e.g. in machine readable form, or encoded information, e.g. as transmitted through a 
network. 

[001 6] The expression "theme" is herein used to describe the area, field, matter, topic, or subject to which the original 
40 information is deemed to be related. 

[001 7] A crucial component of the present invention is the so-called knowledge database which is addressed in the 
following sections. This knowledge database has a novel and unique structure. 

[001 8] Knowledge database: A knowledge database is a kind of library describing the knowledge of the world, or a 
particular area of interest thereof, by using a well-defined structure that consists of components such as possible rele- 
ts vant types of semantical units and their possible mutual connections, as schematically illustrated in Figure 1 . The inven- 
tive knowledge database consists of semantical units and various types of pointers between semantical units, where 
the pointers themselves may be regarded as semantical units. Each pointer may carry a fixed or variable weight (herein 
also called semantical distance), where the inverse of the weight of a pointer represents some kind of semantical dis- 
tance between the two semantical units it connects, i.e., it represents the degree of (semantical) association between 
50 the two semantical units across this particular link. 

[001 9] Since the weights used in connection with the present invention are attached to the links, it is clear which two 
semantical units' semantical distance they correspond to. Weights are not compared, but used to compute the seman- 
tical distance of any two linked semantical units (the two semantical units may be connected through further semantical 
units, their semantical distance then being the product or some other suitable combination of the individual distances). 
65 Thus, this concept of semantical distance establishes a metric on the knowledge database. Finally, it is advantageous 
to use a variable or fixed threshold below which connections are ignored. So if two semantical units ar connected 
through (for instance) three links (thus involving two nrtore semantical units), and the product or other suitable combina- 
tion of the three weights is below the threshold, then one can assume that there is no association betw en the two 
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semantical units. This method allows to make the network local, i.e.. each semantical unit has only a limited number o1 
associations and the local network structur around each semantical unit Is not too difficult. For example, there are no 
loops which could cause contradictions. A local network is herein also referred to as a subset. 
[0020] Furtherm re, the weights used herein might be variable. This means that the weights can be adjusted depend- 
6 ing on the given/presumed theme. Certain rules for adjusting the weights according to the given/presumed theme might 
be stored with the links to which the respective weights are attached. 

[0021 ] Semantical units in the knowledge database and/or in the resulting sematical network may carry a "potential". 
If a semantical unit carries a potential It corresponds to the semantical unit's Importance In relation to the segment or 
semantical unit from the input string currently under investigation. 
10 [0022] The "matching link" from a segment or semantical unit to a semantical unit in the knowledge database may 
carry a "fitness". If a matching link carries a fitness it corresponds to the classification probability, i.e. the probability that 
the segment or semantical unit from the input string has been correctly matched with a semantical unit in the knowledge 
database. 

[0023] When referring to a knowledge database, either a library describing knowledge of the world, or a library with 
15 application specific information is meant. The knowledge database is herein also referred to as world library. An exam- 
ple of an application specific knowledge database is a database which comprises information relevant for the process- 
ing of insurance claims. An example of such a knowledge database will be given later. 

[0024] The knowledge database r^lects the knowledge of the world or the knowledge of a certain area or field. The 
content of this knowledge database always forms a subset of the content of the real world, corresponding to a limited 

20 life experience of the computer and the human being who programmed the conputer. However, a knowledge database 
can be expanded either by automated learning from analyzed Input, or by adding separately obtained sub-worlds (e.g. 
in the form of application -specific modules). It is conceivable to provide updates for the knowledge database through an 
intranet or internet. Likewise, one might for example link a particular knowledge database when processing a piece of 
literature, or when analyzing a computer program. 

25 [0025] The structured representation of aspects of the world with the knowledge database Is achieved by a multiscale 
approach related to the work of B. Mandelbrot and K. Wilson. Self-similar representations are used on different scales 
to describe the behavior of objects in a dynamical hierarchical network, as will be described in connection with an exam- 
ple (see Figure 4). Furthermore, self-similar algorithms are used when making use of the knowledge contained in this 
database. However, the inventive approach goes beyond the theory of B. Mandelbrot and K. Wilson and predominantly 

30 deals with the behavior of elements and structures rather than with their appearance. 

[0026] It is to be noted that there is a fundamental difference between knowledge and understanding. One can accu- 
mulate arbitrary amounts of knowledge without having any understanding, while the converse is not possible. Knowl- 
edge Is the isolated representation of pure facts, while understanding arises from strong connections between isolated 
facts, and from abstraction. The database described in R.V Guha's paper has a large amount of knowledge but only a 

35 small amount of understanding, because the links in this database appear in its network of topics, while all the individual 
entries are not connected among each other. 

[0027] The inventive knowledge database is a complex fractal hierarchical network of semantical units and pointers. 
[0028] Fractal hierarchical network: A network consists of nodes (here called semantical units) and links (here 
called pointers) between the nodes. A network is called hierarchical if there are links of the type "... is kind of x" (hypo- 

40 nyms), "x is kind of ..." (hypernyms). "... is part of x" (meronyms), and "x is part of ..." (holonyms) for a given node x, 
where the first and second relation type groups several nodes by their similarity to one new node, and the third and 
fourth relation type groups several nodes by their functional connection to one new node. Examples are: a desk chair, 
an armchair, and a rocking chair are all grouped in the semantical unit 'chair' by their similarity (they are all chairs), while 
a backrest, a leg, and a seat are all grouped in the semantical unit 'chair' by their functional connection (they are all func- 

45 tional parts of a chair). 

[0029] A hierarchical network is called fractal if the following four conditions are satisfied: 

All nodes are similar (derived from one template). 

50 • All links are similar (derived from one template). 

Links may also be nodes. 

Hierarchical links are possible, and at least one node must have a hierarchical link. 

55 

[0030] The construction of a fractal hierarchical network, according to the present invention, is achieved as follows. 
The network is given by a list of semantical units and pointers, as illustrated in Figure 1 . There might be different types 
of semantical units (objects, relations, and attributes, as defined later) and pointers (similarity pointers, functional point- 
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ers. attribute pointers, and role pointers, also defined later). These pointers can be oriented upwards, downwards, or 
h rizontally (Note that these directions are used to better d^ine th hierarchical structure of the network, e.g. down- 
ward pointers point to a lower hierarchical level). The various semantical units are interconnected in various ways. 
Some of the pointers are hierarchical representing the multlscale approach. Knowledge is presented in the knowledge 

5 database as an associative network. According to the present Invention, segm nts and/or semantical units derived from 
an input network are matched with entries in the knowledge database. As discussed above, the knowledge database is 
of hierarchical nature and all elements are alike, so that the network has a fractal hierarchical structure. Algorithms can 
operate on elements at any hierarchical level In the same way, making them fractal' algorithms. In addition, every 
semantical unit Is linked to its associative semantical unit. These associative links rellect how one understands each 

10 semantical unit. It Is Important to note that these pointers can exist between any two semantical units. The pointers 
themselves may be regarded as semantical units that can have pointers to other semantical units, reflecting the fact that 
something could act on the association between two semantical units rather than on the individual semantical units. 
According to the present invention, the complex structure of the world is significantly simplified through the fractal 
organization of the knowledge database. This also greatly simplifies the data entry into the knowledge database as one 

t5 only needs to properly define the individual semantical units, and the complex network is aeated automatically by the 
Individual definitions together with the possible inheritance rules originating from the fractal hierarchical structure. 
[0031] Seirantlcal units: A semantical unit is a set that contains one or several pieces of Information. It may be rep- 
resented by a word, an object, a relation, an attribute, a combination of words and/or objects and/or relations and/or 
attributes, a (hierarchical) network of words and/or objects and/or relations and/or attributes, a part of a sentence or a 

20 whole sentence, a part o1 a paragraph or a whole paragraph, or a part of a story or a whole story. 

[0032] Semantical units In the knowledge database: In the knowledge database semantical units are used as in 
the above definition. A semantical unit Is given by a word or a phrase (representing the name) and by ail the pointers 
attached to it. For the present implementation we define 3 types of semantical units: objects, relations, and attributes. 
Note that it is also possible to define a larger or smaller number of semantical units, e.g. such that attributes are part of 

25 objects or relations. 

object: Semantical units of this type con'espond to individual semantical units that exist independently of other 
semantical units. Every object might have a set of pointers to other objects. Each pointer may have a weight corre- 
sponding to the semantical distance of the two objects it connects. Every object might have a set of pointers to 

30 other relations which correspond to the possible relations the object can play a role in. Each pointer might have a 
weight corresponding to the semantical distance of the object and the possible relation it can play a role in. Every 
object might have a set of pointers to other attributes which correspond to the possible attributes the object can 
take. Each pointer might have a weight con-esponding to the semantical distance of the object and the possible 
attribute (i.e. the importance of the attribute for the object). 

35 Note that each of these pointers is in fact a special type of relation and can thus be pointed at by other seman- 

tical units. This reflects the fact that some semantical unit may take influence on a possible relation or possible 
attribute of an object. 

relation: Semantical units of this type correspond to semantical units that represent relations of any kind between 
40 semantical units of all types. Every relation might have a set of pointers to other relations. Each pointer may have 
a weight corresponding to the semantical distance of the two relations it connects. Every relation might have a set 
of pointers to other objects which correspond to the possible roles the object can play in the relation. Each pointer 
might have a weight corresponding to the semantical distance of the relation and the possible role (i.e. the impor- 
tance of the role for the relation). Every relation might have a set of pointers to other attributes which correspond to 
45 the possible attributes the relation can take. Each pointer might have a weight corresponding to the semantical dis- 
tance of the relation and the possible attribute (I.e. the importance of the attribute for the relation). 

Note that here also each of these pointers is in fact a special type of relation and can thus be pointed at by 
other semantical units. 

so • attribute: Semantical units of this type correspond to semantical units that represent detailed information about 
particular states of objects and relations. Every attribute might have a set of pointers to other attributes. Each 
pointer may have a weight corresponding to the semantical distance of the two attributes it connects. Every 
attribute might have a set of pointers to possible values of the attribute. Each pointer might have a weight corre- 
sponding to the semantical distance of the attribute and the possible value. Values are attributes. They may be 

55 arranged on a one-or multidimensional scale to represent their mutual semantical arrangement. Time and space 
may be attributes. If pointed at by an object, they refer to the time and space when and where an object exists or Is 
valid; if pointed at by a relation they refer to the time and space when and where a relation takes place; and if 
pointed at by an attribute they refer to the time and space when and where a state is assumed. 
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Note that here also each of these pointers is in lact a special type of relation and can thus be pointed at by 
other semantical units. 

[0033] Classes of pointers: The pointers can be viewed as dir cted associative connections between semantical 
5 units. Some of them establish the hierarchical structure. The Knowledge database according to the present invention 
might comprise the following classes of pointers: 

Hierarchical pointers: There are two Kinds of hierarchical pointers (see Figure 1); hierarchical classes (similarity 
pointers) and hierarchical structures (functional pointers). Both Kinds can point either in upward or downward direc- 
10 tion, corresponding to hierarchical associative connections. 

Horizontal pointers: There are two Kinds of horizontal pointers (see Rgure 1); similarity and functional pointers, 
corresponding to non-hierarchical associative connections. 

75 • Attributionai pointers: This is one Kind of pointer, corresponding to possible attributional associative connections 
(e.g. semantical units pointing at other semantical units which can be their possible attributes). Note that an attri* 
butional pointer may be regarded as a special Kind of horizontal or hierarchical pointer. 

Role pointers: This is one kind of pointer, corresponding to possible role of associations (e.g. semantical units 
20 pointing at other semantical units which can occupy their possible roles). Note that a role pointer may be regarded 
as a special kind of horizontal or hierarchical pointer 

[0034] All pointers may be regarded as semantical units and. therefore, might have the same classes of pointers 
attached to themselves. This corresponds to the complexity of associations in the real world. 

25 [0035] The structure of the inventive knowledge database extends the object-oriented concept in the following sense. 
One might have an object (or class) "car" in the inventive knowledge database, and in a given input string one may find 
an instance of this class, a specific car, say, "Mr Dent's Ford". Then (as an instance) "Mr. Dent's Ford" carries all the 
data and member functions of the class "car". However, not all data may be specified, for instance, the color may not 
be specified and it must not be set to any default value (e.g. red) by the constructor Even worse, "Mr, Dent's Ford" may 

30 carry data that is not defined in the class "car", because this Knowledge is not yet Known to the Knowledge database. 
So the object "Mr. Dent's Ford" is only what is called an "approximate" instance of the class "car". 
[0036] Another general problem is inheritance. Only subclasses can inherit definitions from their superclasses. So if 
one wants to employ the concept of inheritance in strict object-oriented terminology, all entries in the inventive Knowl- 
edge database must be individual classes, some of them subclasses of others. However, this does not allow for "hori- 

35 zontal/associative" connections because two classes cannot be connected by a link (only their instances can). 

[0037] First embodiment: According to the present invention, an input string 12 (e.g. a text or a speech) is trans- 
formed into a semantic network 13 of semantical units, as illustrated in Figure 2A. This might be done by a semantic 
processor 14 in conjunction with a Knowledge database 1 1 . The semantic processor 1 4 transforms the resulting seman- 
tic networK 13 into an appropriate output 15 (e.g. a text or a speech or a reaction). 

40 [0038] Second embodinient: The second embodiment adds one or several persons and/or one or several machines 
of type 10 to the first embodiment, closing the interaction cycle so that an extended communication can taKe place, as 
illustrated in Figure 2B. 

[0039] Third embodiment: According to the present invention, an input string 12 (e.g. a text or a speech) is trans- 
formed into a fornnal network 18 (herein also referred to as input network) of semantical units, as illustrated in Figure 
45 20. This might be done by a semantic preprocessor 1 7. There are various conventional techniques for this transforma- 
tion, as will be addressed below. 

[0040] Semantic preprocessor: The semantic preprocessor 17 transforms an input string 1 2 into an input network 
18 (formal networK). A semantic preprocessor, as used in connection with a speech recognition system for example, 
might consist of four parts: 

50 

I. Voice recognition software to transform speech into an input string. This feature is optional as the input data may 
already be presented in written form. 

II. Syntactic (ChomsKyan) parser to create a constituent structure of the input string. 

55 

III. Grammatical parser to create a functional structure of the input string's constituent structure. The grammatical 
theories of Lexical Functional Grammar or General Phrase Structure Grammar provide possible frameworks for this 
step. 
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IV. Transformer (scope analyzer) to resolve language Issues such as pronouns, direct and indirect speech, relative 
clauses etc. It creates the input (formal) network from the functional structure. The logical Discourse Representa- 
tion Theory provides one possible framework for this step. 

5 [0O41] The behavior of a semantic preprocessor is explained below in connection with an example illustrated in Rg- 
ures 3A-3C: 

Speech: "Mike was a young boy. Every morning he walked to school." 

10 Input string: Mike was a young boy Every morning he walked to school. 

Constituent structure: Parser creates tree structure. Since there are two separate sentences, the parser creates 
two trees, as illustrated in Figure 3 A. 

15 Functional structure: Parser identifies functions of words, as schematically illustrated in Figure 3B. 

Formal network: Transformer resolves pronoun 'he'. An example of a formal network Is illustrated in Figure 3C. 

[0042] As shown in Figure 3A, there is one tree for each sentence (S). Each of the trees is subdivided into a noun 
20 phrase (NP) and verb phrase (VP). The tree representing the first sentence (on the left hand side of Figure 3 A) is further 
divided into a noun (N) branch, a verb (V) branch, and another noun phrase (NP) which has a determiner (DET), a mod- 
ifier (MOD), and a noun (N). The second sentence, as represented on the right hand side of Rgure 3A, has one branch 
with a prepositional phrase (PP) and a noun (N), and one branch with a verb (V) and another prepositional phrase (PP). 
Both prepositional phrases (PP) have a preposition (P) and a noun (N). 
25 [0O43] In a next step, the function of the words is identified, as schematically illustrated in Figure 3B. The following 
abbreviations are used in this Figure: PRED (predicate); SUBJ (subject); OBJ (object); NUM (number), SING (singular); 
PL (plural); PERS (person); 3 (third person): DEF (definitness); + (definite); - (indefinite); SPEC (specification); MOD 
(nrx>difier); PCASE (prepositional case). 

[0O44] Finally, the sennantic preprocessor 17 generates an input network (fomrial network) 18, which is shown in Fig- 
30 ure SC. This network comprises three objects 30-32, There is an upward similarity pointer 36 and a corresponding 
downward similarity pointer 37 between the first object 30 (Mike) and the second object 31 (boy) which indicates that 
'Mike' is 'a kind of 'boy'. The second object 31 (boy) has an attribute pointer 38 which points to the attribute 34 (young). 
There is a horizontal functional pointer 39 between the first object 30 (Mike) and the relation 33 (walk to), and a role 
pointer 40 in the reverse direction, which is at the same time a relation (agent). An attribute pointer 43 points to the time 
35 attribute 35 (every morning). A functional pointer 41 points from the third object 32 (school) to the relation 33 (walk to), 
and a role pointer 42 in the reverse direction, which is at the same time a relation (location), tt is to be noted that the 
above details are merely given to describe how an input network can be obtained and what its structure might be. 
[0O45] Once the input network 18 is created, the semantical processing commences. This can be done following any 
one of three procedures that lead to an equivalent result, the "resulting semantic network" 13. 

40 

1) The semantic processor 19 takes the input network 18 and locates a subset in the knowledge database 1 1 that 
is deemed to be the best fit for all semantical units in the input network 1 8. This subset is then called the resulting 
semantic network 13 

45 2) The semantic processor 19 takes the input network 18 and expands it with related semantical units from the 
knowledge database 11. This expanded input network is then called the resulting semantic network 13. 

3) The semantic processor 19 creates a new fractal hierarchical network of semantical units and pointers from the 
input network 18 and the knowledge database 11, where the components are selected according to a matching 
50 algorithm, The newly created fractal hierarchical network is then called the resulting semantic network 13. 

[0046] The resulting semantic network 13 (created by any of the above processes) reflects both the general meaning 
and the individual aspects of the input string 1 2 and is - like the knowledge database 1 1 - represented by a fractal hier- 
archical network of semantical units and pointers. The creation of the resulting semantic network 1 3 by any of the above 
55 processes is performed by a matching algorithm with data in the knowledge database 1 1 and by information extraction 
from this knowledge database 11. This is only possible if th structur of the resulting semantic network 13 and the 
knowledge database 1 1 are similar. 

[0047] There are different ways to implement the semantic processor 1 9. The actual implementation depends on the 
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question whether one wants a subset to be defined within the knowledge database 1 1 (1st implementation), the input 
network 18 to be expanded (2nd implementation), or a new network 13 to be generated (3rd implementation). 
[0048] Semantic processor: Note that the implementation of th semantic processor depends on the environment 
in which is going to be used. The semantic processor of the first and the second embodim nt (Figs. 2A and 2B, respec- 

5 lively) differs from the semantic processor of the third embodiment (Fig. 2C). 

[0049] According to the first and second embodiment (Figs 2A and 2B), the semantic processor 14 creates a first 
guess of the resulting semantic network 13 by assigning semantical units, roles (such as "agent", "object", "source/des- 
tination", "instrument", "location", etc.) and attributes to segments or individual semantical units of the input string 12. 
Then the semantic processor 14 reads out the possible subsets from the knowledge database 1 1 that are associated 

10 with the various segments or semantical units of the input string 12. It performs a matching of semantical units, 
attributes, and roles of the above guess with information from the knowledge database 1 1 through inheritance, Imple- 
mentation and ovenwriting rules. 

[0050] According to the third embodiment (Rg 2C), the semantic processor 19 creates a first guess of the resulting 
semantic network 13 by assigning semantical units, roles (such as "agent", "object", "source/destination", "instrument", 

15 "location", etc.) and attributes to the individual semantical units discovered by the semantic preprocessor 17. Then the 
semantic processor 19 reads out the possible subsets from the knowledge database that are associated with the vari- 
ous objects of the input string 12. It performs a matching of semantical units, attributes, and roles of the above guess 
with information from the knowledge database 1 1 through inheritance, implementation and ovenwriting rules. 
[0051] Prejudgement: As next step, a '1heme"-matching (or prejudgement) might be carried out by the semantic 

20 processor 14 or 19. It chooses a theme from the set of possible "themes" which influences the semantical distances of 
semantical units in the knowledge database 1 1 and verifies how well a segment or semantical unit of the input string 
matches with its (suspected) counterpart in the abstract knowledge database. Note that for this evaluation the semantic 
processor 14 or 19 retrieves the requested information (the local network around the suspected semantical unit) from 
the knowledge database 11 . For this purpose, the semantic processor 1 4 or 1 9 uses the following information: a seman- 

25 tical unit (an entry in the knowledge database 1 1), a possible theme (defines how to adjust the weights), and a threshold 
(defines where to cut off the network wound the given semantical unit in the knowledge database 11). Then the 
retrieved local network from the knowledge database 1 1 and the given concrete segment or semantical unit from the 
input string are compared to find out just how close the second comes to being an instance of the first. This measure 
(fitness') yields a likelihood for the semantical unit classification (which is of course theme'-dependent). Finally, the 

30 retrieved local networks are compared to find the best combination if multiple possibilities (i.e. multiple matching 
semantical units in the knowledge database) were encountered. The whole task is done iteratively for all known seman- 
tical units and themes until a good (preferably tiie best) match is found, e.g. the resulting semantic network 13 con- 
verges to a stable state (the "meaning"). 

[0052] According to the third embodiment of the present invention, the information contained in the input network 18 

35 is expanded by adding knowledge from the knowledge database 1 1 . 

[0053] To avoid adding the whole content of the knowledge database 1 1, the expansion process might be self -con- 
trolled by a theme-prejudgement mechanism (derived e.g. by condensing semantical units into more abstract semanti- 
cal units, or by counting the numbers of connections at individual semantical units in the input network 18 or semantic 
network 13). The prejudgement might be continuously updated and can even be dramatically corrected if a contradic- 

40 tion or a change of tiieme is discovered. In addition, it determines and/or alters the weights in the knowledge database 
1 1 , so that if, for instance, a semantical unit's link to another semantical unit is increased, tiien the second semantical 
unit's neighbors from the knowledge database 11 will also be added to the resulting semantic network 13 since tiiey 
may be relevant within the (currentiy supposed) theme (semantic enhancement). Rnally, if an input string 12 is rather 
long, then the prejudgement might even have a hierarchical structure (i.e. abstracts of abstracts). 

45 [0054] Resulting Semantic network: The resulting semantic network 1 3 consists of self-similar semantical units and 
pointers giving it a structure of a fractal hierarchical network. Preferably, the resulting semantic network 13 has a struc- 
ture similar to the one of the knowledge database 1 1, thus allowing a comparison between the two. All possible inter- 
pretations and (re)actions of the inventive system to textual information contained in an input string 12 are part of the 
knowledge database, so that comparing the resulting semantic network 13 to the knowledge database 1 1 is equivalent 

50 to understanding the meaning of the textual information contained in the input string 12. 

[0055] Once the original information carried by the input string 12 is analyzed, as described above, one might: 

• provide a summary or abstract which summarizes the original information according to predefined rules; 

55 • fill in fields of a questionnaire about the input string 12 (such as a form, for example} with appropriate information; 

• answer individual questions; 
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* pose questions, e.g., to better understand the situation; 

engage in a discourse witli a user, for example, to resolve unresolvable disambiguities; 
exchange information with another system, or retrieve informati n from anotlier system; 

• extract meaning; 

take steps or measures depending on the interpretation of the original information, i.e., the understanding of the 
meaning of the input string 12 can result in an adequate reaction of a system, which could be a predetermined 
action triggered If a particular textual information is determined to be comprised in the input string 12. 

[0056] This may be done by an a semantic postprocessor 21 that creates an appropriate output 1 5. 
[0057] It is to be noted that the result of the inventive approach depends on the quality of the knowledge database 1 1 . 
It is immediately obvious that a system for the processing of insurance claims requires an appropriate knowledge data- 
base. If one tried to interpret the original information carried in an input string derived from a letter which was sent by 
an insured person to his insurance agency in the light of a music knowledge database, the result would most likely be 
nearly useless. 

[0058] In the following, an exemplary algorithmic description of a semantic processor is given. The algorithm below 
might be used for processing a sentence. 

= begitming of algorithmic description ====—=============== 

For all semantical units in input 

string (omit ''be " and "have ") //suppose there are N semantical units 

{ 

For all fitting knowledge database 

entries (string match) // suppose there are n, 

//(I<:=i<=N) 

( 

Create semantical unit instance 
Inherit all possible attributes from knowledge database 
If (object) Inherit all possible relations from knowledge database (including 
attached roles) 
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// (relation) inherit all possible role objects from knowledge database 

Compute isolated fitness: 

{ 

fi,'- J /sqrt(ni) 

Adjust for implemented attributes: good fit: -^25% rei, bad fit: -10% rei 
Adjust for implemented relations: good fit: 4-70% re/., bad fit: -50% rei 
Adjust for Implemented roles: good fit: -^10% rei, bad fit: -50% rei 

} 

Find local neighborhood: 

{ 

Set potential of semantical unit to sqrt(k/m) 

// k = # appearances of semantical unit In story up 

to present sentence 
//m = total number of semantical units in story up 
to present sentence 
Propagate potential across weights in knowledge database 
Attach everything above min. threshold t^i„-0,3 

} 

I 

I //yields Aj/+...+«iv local neighborhoods 

For all combination of local neighborhoods // there ore fi{^,.. *// combinations 

i 

Compute local fitness adjustment (cellular automat method): 
{ 

Count how many double, triple, etc. overlaps of objects (l2(t), ... , lN(t)) 
//Note that the /, depend on threshold t 

Ifoi = maxtmifK^tKmi { Vi + !/KArcTan{20 t - 10 ^ 2 sqrt(i:i.2^ i i /N) } 
//May improve this formula by making it source" dependent, 
///.£. overlaps between subject and predicate count more than others. 

) 

Adjust fitness of each semantical unit in combination with calculated local 
fitness adjustment 

Compute total fitness of combination: 

I 

F, = {30% pred. fitness + 30% subj. fitness + 20% obj. fitness + 20% other) 
//all of the above relative to 100%. in case any of the categories is 
//missing or multiply present. 

I 

) 

Pick combination with highest total fitness as correct sentence network 

Connect semantical units according to predicate -argument structure of input network 

= end of algorithmic description ==: rT-r-=- =: ■ ,-== =:^== =:= i : 



[0059] The operation of a system according to the three embodiments of the present invention Is described in con- 
nection with the following text comprising three sentences. The information contained in these three sentences is 
expanded using knowledge from a knowledg database 1 1 which is shown in Figure 4. As can be seen from this Figure, 
the knowledge database comprises the semantical units and pointers illustrated in Rgure 1 . 
[0060] Input string comprising three sentences 
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L There is an old plant on a meadow. 

II. Weeds are already growing next to It. 

III. Because the plant is ugly, people will tear It down. 

[0061] Processing of sentence I (comments are included in parentheses [...] ) : 



objects: predicate: be <SUBJ> <LOC> 

subject: plant Ififst semantical unit] 

attribute: old 



location : meadow I second semantical unit] 

preposition: on 

N ~ 2, ni = 2, nj =i J (There are two objects, i,e, N=2. There are two 'plants* in the 

knowledge database II (referred to as plant j and plants), i.e. ni=2. 
The word 'meadow' appears only once, i.e, n2=L Figures 5, 7, and 9 
show the inherited possible attributes and relations of both 
semantical units.] 

Compute bolated fitness and potential (plant): 



fi:^1/sqrt(2)^25%rel. 



pi^sqrt( 1/2) = 0.7071 



^0J07 1 -¥0,0732^0,7803 



[isolated fitness ofplantj and 
plants. Plant implements 
possible attribute 'age'] 

[isolated potential of plant/ and 
plants.] 
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Compute isolated fitness and potential (meadow): 



fi = } /sqrt(J) + 0% rei = LO 



P2 = sqrt(I/2) = 0,7071 



[ isolated fitness of meadow. No 
implementations ] 

(isolated potential of meadow] 



for propagation of potentials see Figures 6, 8, and JO 



there are 3 local neighborhoods (subsets) and 2 combinations 



combination I (plant t and meadow): 



no overlaps above t„i„ -0.3 



[there is no relation between plant t 
and meadow; the threshold t„i^ is 0,3 
in the present example j 



Ifai = 0 



[no change in fitness f 



F, = 60% subj.fit, + 40Wo obj.fit, = 0.6 * 0.7803 + 0.4 * LO = 0.8682 [fitness of 

combination ]] 

combination 2 (planti and meadow): 

h = 3 (OJ <= / <= 0,353 J); = 2 (0.353 J < t <= 0.3620); 

h = J (0,3620 < t <= 0.4414); h = 0 (0.4414 < f <= 1.0) [For overlaps see Figures 

8 and lOj 

lfa2^ max = 0,7201 

fi = 0.7803 + 7Z0I% rei = 0.7803 + 0.1582 = 0,9385 [improved fitness of plant 1 1 



f2 = LO + 72.01% rei ^1,0+0^ I.O 



[improved fitness of meadow] 



Fz = 60% subj. fit. + 40% obj. fit. = 0.6 * 0,9385 + OA * 1.0 = 0,963! [fitness of 

combination 2] 



[0062] Please note that the above three s ntences (input string 12) are selected to show how the present invention 
deals with information which Is ambiguous. Sentence I might either refer to a living thing (plant2 in Figure 4), or a bulld- 
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ing (planti in Figure 4). The above algorithms are defined and optimized such that the inventive system can determine 
to which semantical units in the knowledge database 1 1 the segments or semantical units in the input string 12 are 
associated. The system 10 processes the first sentence I. Based on the above equations, the system 10 determines 
that it is more likely (F^ > F|) that in this first sentence I plant refers to a living thing (plant2)- This conclusion is mainly 
5 influenced by the fact that there is no, association in the knowledge databas 1 1 between the object 'meadow' and the 
object 'planti'. 

[0063] The inventive system 10 identifies a subset (local network) for each segment or semantical unit in the input 
string 1 2. In the present exanrple there are two subsets for the object 'plant*. The first subset is shewn in Figure 6. Since 
the threshold tf^jp, is 0.3 in the present example, all semantical units outside the respective subset are suppressed. The 
10 plant has an isolated fitness of 1/V2 = 0.7071 since two plants (plants and plant2) were found in the knowledge database 

1 1 , while the meadow has an isolated fitness of 1/Vl = 1 .0 since only one meadow was found in the knowledge data- 
base 11 . All possible attributes and relations associated with the object 'plant' are illustrated in Figure 5. The words 'age' 
and 'old' are implemented attributes of the object 'plant' and are shown as attributes in Figure 6. Note that there are no 
relations implemented in the first sentence of the present example. According to the present invention, a classification 

15 probability (adjusted isolated fitness) is calculated which gives an indication as to whether the plant in the input string 
12 is likely to refer to the planti or plant2 in the knowledge database 1 1 , In the present example, the classification prob- 
ability is 1/V2 + bonus = 0.7803. The bonus is added because the attribute 'age' with value 'old' is a possible attribute 
for both 'plant^' and 'plant2'. Next, the isolated potential of the object 'plant' is calculated to VT^ = 0.7071 , because there 
is one appearance of the object 'plant' and there are a total of two semantical units ('plant' and 'meadow*) in the string 

20 up to the present sentence I. The potential of 'plants' is calculated by multiplying the classification probability and the 
isolated potential of lalant'. The potential of 'plant-i ' is 0.7803x0.7071 =0.5518, which is above the threshold of 0.3. The 
weight (semantic distance) assigned to the pointers between 'planti' and 'building' is 0.8, in the present example. The 
potential of t)uilding' Is calculated to 0.551 8x0.8=0.4414 which is also above the threshold of 0.3. The weight (semantic 
distance) assigned to the pointers between 'building' and 'complex' is 1 .0, in the present example, and the potential of 

25 'complex' is calculated to: 0.4414x1.0=0.4414, which is above the threshold of 0.3. The weight (semantic distance) 
assigned to the pointers between 'complex' and 'entity' is 0.7, in the present example, and the potential of 'entity' is cal- 
culated to: 0.4414x0.7=0.3090, which is above the threshold of 0.3. The potential of all other semantical units is below 
the threshold and these semantical units are thus deemed to be of no relevance, By means of the above calculations it 
was shown how the potential propagates through the network until a subset (local network) is identified. Note that the 

30 equations and algorithms can be modified. 

[0064] The subset (local network) of plants is illustrated in Figure 8. This subset is identified using the same approach 
as described in connection with Figure 6. 

[0065] If one now compares the subsets illustrated in Figures 6 and 8, it is difficult to tell which one of the two possi- 
bilities is a better representation of the textual information conveyed in the input string 12. 

35 [0066] Finally, a third subset is identified which corresponds to the second object 'meadow' in the input string 1 2. This 
subset is illustrated in Figure 10. The isolated potential of meadow is 0.7071 . The classification probability of the object 
'meadow' in the input network as the object 'meadow' in the knowledge database is 1/Vl =1 .0 since there is only one 
semantical unit 'meadow' in this database. This yields a potential of 'meadow' in the knowledge database 11 is also 
0.7071. The semantic distance between 'meadow' and 'grassland' is 0.8 in the present example, and tiie 'grassland's' 

40 potential is calculated to 0.7071x0.8=0.5657. The semantic distance between 'grassland' and Vveed' is 0.8 in the 
present example, and the potential of "weed' is calculated to be 0.5657x0.8=0.4526. The semantic distance between 
'weed' and 'plant2' is 0.8 in the present example, and the potential of 'plant2' is calculated to 0.4526x0.8=0.3620, which 
is above the threshold. The semantic distance between 'grassland' and 'location" is 0.7 in the present example, and the 
potential of 'location* is calculated to 0.5657x0.7=0.3960, which is above the threshoW. 

45 [0067] According to the present example we now have tiiree subsets (local networks) for the two semantical units in 
the input string 12. In a next step these subsets are now combined to obtain a resulting semantic network. In order to 
ensure that the resulting semantic network properly reflects what textual information was conveyed in the input string 

12, the most likely combination of subsets has to be selected. There are different ways to do this. The approach 
described here starts with deriving all possible combinations of the local networks. In the present example, these are 

50 the combinations "plant-t + meadow" (combination 1 ) and "plant2 + meadow" (combination 2), since all semantical units 
must appear exactly once in each combination. Then the overlaps in the local networks are determined for each com- 
bination, In combination 1 there is no overlap, while in combination 2 tiie semantical units "weed", "grassland", and 
"plantg" have an overlap (i.e. they appear in the local networks of both "planta" and "meadow"). Therefore, the seman- 
tical units in combination 2 earn a bonus (local fitness adjustment) calculated from the formula 

55 

lfa2= { 1/2 + l/n ArcTan (20 t - 10 + 2 Jl j!^ iUIN)], 
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If one observes that for 0.3 = tmm ^ t ^ 0.3531 we have IgCO = 3, for 0.3531 < t 0.3620 we have laCO = 2, for 0.3620 < 
f ^ 0,4414 we have l2(0 = 1 , and for 0.441 4 < f ^ 1 .0 we have l2(0 = 0, while = 0 for all i > 2, this famula yields Ifag 
= 0.7201. that is. the previous isolated fitnesses of meadow and plant2 are increased by 72.01% r lative to 1.0. thus 
yielding the respective values for "plant as plant2" of 0.9385 and "meadow as meadow" of 1 .0. Rnally, the total fitness 

5 for each combination is calculated from the formula or F2 = (30% predicate fitness + 30% subject fitness + 20% 
object fitness + 20% other). Since the percentages are relative to 100% and there Is only one grammatical subject 
(plant) and one object (meadow], this yields the modified formulas and values » (60% plant as plant-i fitness -i- 40% 
meadow as meadow fitness) = 60% x 0.7803 + 40% x 1.0 = 0.8682, and F2 = (60% plant as plant2 fitness + 40% 
meadow as meadow fitness) = 60% x 0.9385 + 40% x 1 .0 = 0.9631 for combinations 1 and 2, respectively. Since F2 > 

70 Fi, the object "plant" is identified as "plant2", the living thing. This identification may be used to select a theme (pre- 
judgement) before processing the next sentence. Note that here also the equations and algorithms can be modified. 
[0068] If one now also processes the other two sentences II and III. the system gets additional information which 
either leads to a reconsideration of a prior combination, or to a refinement of a combination. Note that the second sen- 
tence II talks about weeds. For the system this furthers the supposition of sentence I that plant refers to plant2 as a living 

15 thing rather than a building. After having processed the third sentence III, this picture has to be revised, because this 
sentence contains the relation 'tearing it down'. This expression is never used in connection with living things in the 
object role (but only in the agent role). The third sentence thus seems to indicate that the first and second sentence 
should have referred to a building rather than a living thing. With this new theme selection sentence I and II may be 
reprocessed. The combination of subsets is dynamically changed until all textual information conveyed by an input 

20 string is processed. Due to this iterative approach it is possible to ot)tain a resulting semantic network that gives the best 
possible representation of the information carried In the input string. 

[0069] The second sentence does not add any information which helps to better understand the meaning of the first 
sentence, because weeds can grow next to a building or next to a living plant. Since weeds and living plants are some- 
what related, at this point it seems nnore likely that the first sentence refers to a living plant. The third sentence finally 
25 contains information which helps the system to 'understand' the meaning of the first and second sentences. It is clear 
from the third sentence that in the other two sentences 'plant' refers to a building. 

[0070] The present invention can be used to provide systems that respond by creating meaningful outputs. In the fol- 
lowing example the input is taken from a car accident report and output is automatically generated complying with cer- 
tain classifications, e.g. what kind of cars were involved, what type of accident took place, or whose fault it was. Such a 
30 system might be used by insurance companies for the automated processing of insurance claims. 

[0071] An example of an insurance case is used to illustrate further details and aspects of the inventive system. 
[0072] The following text from an insurance claim form is to be analyzed using the inventive system: 

"/ was attempting to reversing pari< vehicle in front of a iiouse I was visiting. Unfortunately ttiere was already 
35 another car parked by the house. I tried to park close to vehicle and unfortunately put the vehicle into the wrong 
gear, reverse and not park (automatic vehicle). There were no warnings. " 

[0073] The semantic preprocessor 1 7 creates an input network 1 8 using standard parsing techniques. The sentences 
of the above input string are put into a predicate-argument-structure, that is, the predicate has various roles attached 

40 to it. The first sentence already exhibits a hierarchical structure, where the relative clause "I was visiting" is a modifier 
of the location 'house', but 'house' is also the (non-existing, but referred) object in the relative clause, as is always the 
case with relative clause constructions. A few straightfonvard transformations are done, such as splitting the third sen- 
tence after the conjunction 'and' into two sentences and attaching the attribute 'automatic' to the proper noun 'vehicle'. 
Finally, certain parts of the input network might be combined since the clause "reverse and not park" only explains what 

45 "wrong gear" means. 

[0074] The semantic processor 1 4 then acts upon this input network. It accesses the knowledge database 1 1 for addi- 
tional information, if necessary In the present example an illustration is given of the fact that the vehicle is 'automatic'. 
Here one needs to understand what 'automatic' means in conjunction with 'vehicle', and what the possible conclusions 
are. To expand the information In the input network one makes use of the knowledge around 'car* in the knowledge data- 

50 base 11. 'Car' is kind of (has hypernym) Vehicle', 'transmission' Is part of (has meronym) 'car', which in turn may be 
'automatic' or 'manual'. If it is 'automatic' it has 'gears' which correspond to (interact with) a certain 'speed* of the object 
'car'. In particular, 'gears' are 'park* or 'reverse'. The speed of the particular 'gear P' Is zero (a value of the attribute 
'speed*), while the speed of the particular 'gear R' is between zero and minus 1 0 km/h (a possible range of the value, 
where the minus-sign indicates the direction of movement). 

55 [0075] In addition, one may have an entry that 'have an accident' is a 'relation' between two 'cars' (objects). Any two 
semantical units that exist as real things may hav a relation of some kind, and that relation may have an attribut 'rel- 
ative speed', which contains a formula such as 'equals absolute value of speed of semantical unit 1 minus speed of 
semantical unit 2'. where the individual speeds refer to the attributes 'speed' of the respective semantical units attached 
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to the relation. The relation 'have an accident' thus inherits the attribute 'relative speed*, and in the formula the speeds 
of the two semantical units is replaced with the speed of the two cars. To obtain the 'relative speed' in th present case, 
one knows that one car is parked, and the value of the attribute 'speed' of a car is set to zero if the car is parked (this Is 
stored in the formula to compute the speed of a car], so the speed of the first car is zero. For car two, one knows that it 
6 is in reverse, so its speed is between zero and minus 10 knVh, and, therefore, the 'relative speed' is between 0 and 10 
km/h. 

[0076] Finally, the knowledge database may contain further information about 'automatic vehicles', such as if 'gear' Is 
not 'P' or 'N', and the brake is not applied, then speed is at least 5 km/h (r^lecting that automatic vehicles tend to crawl). 
This scenario may be brought into the context: since there was an accident, the relative speed cannot have been zero. 
10 so it is possible that the driver put the car into ' R' but did not apply the brakes, thus bumping into the parked car at crawl- 
ing speed. 

[0077] Once the textual information conveyed in the input string is expanded (transformed) into a resulting semantic 
network, the insurance agent, for example, might ask questions which the system then can answer. Likewise, the sys- 
tem might create a printout in a standardized form which gives an objective and informative description of what has hap- 
15 pened- The system might even propose actions. It might for example authorize the reimbursement of expenses which 
somebody involved in the accident claimed. 

[0078] Proposed are schemes and systems based on a special model of textual information and natural language. 
According to our model, natural language as well as textual information consists of semantical units and pointers which 
are grouped at different levels of hierarchy and are all of a similar type. In addition, we use weights to express the 

20 semantical distance of two linked semantical units. Thus, the knowledge database, speech, and questions are all rep- 
resented in what is herein called a fractal hierarchical network. The local network of a speech or a question is created 
by locating its semantical units, possible relations, possible attributes, and possible roles in the knowledge database 
and copying the semantical neighborhoods from the knowledge database, whereby overlapping areas are increased. 
Finally, the overlap of a speech and a question network yields a resulting semantic network which can be used to e.g. 

25 generate a meaningful answer or reaction. 

[0079] The present invention can also be used for data mining purposes. The inventive approach allows to extract 
meaning from the textual information conveyed in input strings and can process huge amounts of inforn^tion. It can 
determine relationships and trends that were previously invisible or unclear. The inventive approach allows to automat- 
ically apprehend meaning of input strings of any length with a previously unmatched quality. 

30 

Claims 

1. Method for the processing of textual information conveyed in an input string together with Information contained in 
a knowledge database representing a network of hierarchically arranged semantical units which are similar across 

35 hierarchies, said method comprising the steps: 

a. segmenting said input string into segments and 

b. combining said segments with semantical units from said knowledge database to generate a resulting 
semantic network of hierarchically arranged semantical units which are similar across hierarchies. 

40 

2. The method of claim 1, wherein at least one of said segments is deemed to be related to a semantical unit, or is 
similar to a semantical unit, and wherein there are at least n semantical units in the knowledge database with n ^ 2. 

3. The method of claim 2, whereby a semantical unit is a set containing one or several pieces of information, such a 
45 semantical unit preferably consisting of a name and pointers to other semantical units. 

4. The method of claim 2, whereby step b. comprises the following steps 

i. identifying a matching semantical unit in said knowledge database, said matching semantical unit being 
50 deemed to be related to the j*^ segment of said input string, 

ii. determining the fitness of said matching semantical unit by taking into consideration said semantical unit's 
possible associations, 

repeating steps t. through ii. until a matching semantical unit is found for all j = 1 m segments of said input string. 

55 

5. The method of claim 4, whereby the combining of said segments with said matching semantical units is done such 
that the resulting semantic network comprises semantical units that are deemed to be related to the textual infor- 
mation conveyed in said input string. 
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6. Method C3f any of the preceding claims, wherein said resulting semantic network is employed for the automated 
apprehension of the textual information conveyed in the input string. 

7. The method of claim 1 , whereby said knowledge database and resulting semantic network comprise sev rat types 
5 of semantical units such as objects, relations, and attributes. 

8. The method of claim 1 , whereby said knowledge database and resulting semantic network comprise several types 
of pointers. 

10 9. The method of claim 4, wherdby steps i. through li. are carried out m times if there are m matching semantical units 
in said knowledge database which are deemed to be related to the j^^ semantical unit of said Input string, and 
whereby only one of said m matching semantical units Is used to generate said resulting semantic network. 

1 0. The method of claim 4, whereby said matching semantical units are combined with each other to form the resulting 
15 semantic network if a corresponding classification probability indicates that the respective semantical units have 

been conectly matched with the semantical units in said knowledge database. 

1 1 . The method of claim 4, whereby said semantical unit's possible associations are its possible attributes and/or pos- 
sible relations and/or possible roles. 

20 

12. The method of claim 4, whereby attributes and/or relations and/or roles of a semantical unit in the input string are 
used to determine whether a matching semantical unit in said knowledge database is better, meaning for example 
that it has a higher fitness than any other matching semantical unit in said knowledge database. 

25 13. The method of claim 4, whereby a bonus is added to inaease the fitness if generally accepted attributes and/or 
relations and/or rules are used in connection with a semantical unit. 

14. The method of claim 4, whereby a malus is subtracted to decrease the fitness if unusual attributes and/or relations 
and/or roles are used in connection with a semantical unit. 

30 

15. The method of claim 1 or 4, whereby each semantical unit in the knowledge database receives a potential, said 
potential being a fixed or variable potential. 

16. The method of claim 8, whereby a pointer between two semantical units in the knowledge database has a weight 
35 corresponding to some kind of semantical distance between the two semantical units, said weight being a fixed or 

variable weight. 

17. The method of daim 1 , wherein the resulting semantic network is a network identified within said knowledge data- 
base. 

40 

18. The method of claim 1 , wherein the resulting semantic network is extracted from the knowledge database to form 
a new separate network. 

19. The method of claim 2, wherein the input string is transformed into an input network comprising said semantical 
45 units. 

20. The method of claim 19, wherein the resulting semantic network is obtained from transforming said input network. 

21 . The method of claim 8, wherein certain of the pointers describe the mutual relationships between semantical units. 

50 

22. The method of claim 16. wherein the inverse of said weight represents some kind of a semantical distance between 
two semantical units it connects. 

23. The method of claim 4, wherein steps i. through ii. are repeated if an additional input string is received until a result- 
55 ing semantic network is generated. 

24. The method of claim 1 , wh rein an input text is transformed into several input strings. 
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25. The methcxJ of claim 1 , wherein a preliminary theme of the input string is determined allowing a quick identification 
of a preliminary subset of information within the knowledge database. 

26. The method of daim 25, wherein said preliminary subset is from time-to-time, or continuously revised, preferably if 
5 a contradiction or change of theme is determined. 

27. The method of claim 1, comprising the step of performing an action depending on the general meaning extracted 
from said input string. 

10 28. The method of claim 1 , wherein an action is performed by providing an answer if the input string was determined 
to comprise a question. 

29. The method of claim 1, wherein the knowledge database comprises a self-similar representation of semantical 
units and pointers across different scales. 

15 

30. The method of claim 4, wherein self-similar algorithms are used when carrying out steps i. through ii. 

31 . The method of claim 1 , wherein the knowledge database reflects the structure of an environment, said environment 
preferably being a subset of the real world. 

20 

32. The method of claim 8, wherein at least one of said pointers is a directed associative connection between seman- 
tical units. 

33. The method of claim 8. wherein at least one of said pointers is a hierarchical pointer, or a horizontal pointer, or a 
25 similarity pointer, or a functional pointer, or an attributional pointer, or a role pointer. 

34. The method of claim 1 , wherein text is transformed into input strings which are then processed string -by-string. 

35. The method of daim 1 , wherein speech is transformed into input strings which are then processed string -by-string. 

30 

36. The method of claim 35, wherein speech recognition software is employed to transform said speech into said input 
strings. 

37. The method of claim 1 9, wherein the input network is generated by means of syntactic parsing and/or grammatical 
35 parsing. 

38. Method for the construction of a fractal hierarchical knowledge database, comprising the steps: 

• recording semantical units, 
40 • linking said semantical units by means of pointers with other semantical units of the fractal hierarchical knowl- 
edge database, 

assigning a weight to said pointer. 

39. The method of claim 38, wherein said pointer is a hierarchical pointer or a horizontal pointer, and said hierarchical 
45 or horizontal pointer is a similarity pointer, or a functional pointer, or an attributional pointer, or a role pointer. 

40. The method of claim 38, wherein there are various types of pointers which describe the associations between 
semantical units. 

50 41 . The method of claim 38, wherein the pointers themselves are regarded as semantical units. 

42. The method of claim 38, wherein the weight assigned to the pointer is fixed or variable. 

43. The method of claim 38. wherein a fixed or variable potential is assigned to a semantical unit, preferably when the 
55 fractal hierarchical knowledge database is used for the automated apprehension of textual information conveyed in 

an input string. 

44. The method of claim 43, wherein a fixed or variable threshold is assigned to said fractal hierarchical knowledge 
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database. 

45. The method of claim 44, wh rein the pot ntial and threshold are used for subset determination. 

46. The method of claim 38. wherein the inverse of the weight of a pointer represents some kind of a semantic distance 
between two semantical units it connects. 

47. Apparatus for the processing of textual information conveyed in an input string together with information contained 
in a knowledge database representing a network of hierarchically arranged semantical units which are similar 
across hierarchies, said apparatus comprising: 

means for segmenting said input string into segments, 
memory for storing said segments, 

* a semantic processor for combining said segments with semantical units from said knowledge database to 
generate a resulting semantic network of hierarchically arranged semantical units which are similar across 
hierarchies. 

memory for storing said resulting semantic network. 

48. The apparatus of claim 47, wherein at least one of said segments is deemed to be related to a semantical unit, or 
identical with a semantical unit, and wherein there are at least n semantical units with n ^ 2. 

49. The apparatus of claim 48, wherein said semantical processor comprises 

means for identifying a matching semantical unit in said knowledge database, said matching semantical unit 
being deemed to be related to the j^*^ segment of said input string, such that a matching semantical unit is found 
for all j = 1 ,...,m segments of said input string, and 

• means for determining the fitness of said matching semantical unit by taking into consideration said semantical 
unit's possible associations. 

50. The apparatus of claim 49. comprising a speech processing unit, preferably comprising speech recognition moduie. 
which transforms speech into said input string. 

51. The apparatus of claim 47 being designed for the automated apprehension of the textual information conveyed in 
the input string. 

52. The apparatus of claim 51 comprising means which trigger a reaction depending on the apprehended information. 

53. The apparatus of claim 47 comprising an post-processor which transforms the resulting semantic network to create 
an output string. 

54. The apparatus of claim 47, wherein the resulting semantic network is represented as a fractal hierarchical network 
of semantical units. 

55. The apparatus of ciaim 48 comprising 

a semantic preprocessor for processing said input string to generate an input network which comprises said 
semantical units. 



18 



11/28/2003, EAST Version: 1.4.1 



EP 0 962 873 A1 




11/28/2003, EAST Version 



EP 0 962 873 A1 




20 



11/28/2003, EAST version: 1.4.1 



EP 0 962 873 A1 




21 



11/28/2003, EAST Version: 1.4.1 



EP 0 962 873 A1 




22 



11/28/2003, EAST Version: 1.4,1 



EP 0 962 873 A1 





23 



11/28/2003, EAST version: 1.4.1 



EP 0 962 873 A1 



O 

< 

CO 
CO 

< 

V 



a 

LU 



'o 
o 



0 0 



O z ^ 

C/3 Z 



Q 



0 
2 

CO CO + 



O 



Q. 



^ ^ - 2 



LU 

CO 



CO 
CO 



O 



CO 

I — 

o 



< 



A 



CO 

o 

< 



00 = 



CO 

< 

V 

LU 
CO 



i 0 

CA) CO 



J— Q "5 CO 

CA> LU ^ c:: 

< |G1 5 LU 

Ql Q- Z CL 



+ CO < ' 

Q ^ CO O .. 

LUuCt: 3 lH qI "i! 

Q CL Z Q- a5 Q 



ZD 

o 
o 



Q 

LU 
Q_ 



c/5 -> 

Z go 

1 1 1 ZD 

H- CO 



CO 

o 



24 



11/28/2003, EAST version: 1.4. 



EP 0 962 873 A1 




11/28/2003, EAST Version 



EP 0 962 873 A1 




11/28/2003, EAST Version: 1.4.1 



EP 0 962 873 A1 




11/28/2003, EAST Version: 1.4.1 



EP 0 962 873 A1 




11/28/2003, EAST version: 1.4.1 



EP 0 962 873 A1 




11/28/2003, EAST Version: 1.4.1 



EP 0 962 873 A1 




30 



11/28/2003, EAST Version: 1.4.1 



EP 0 962 873 A1 




11/28/2003, EAST Version: 1.4.1 



EP 0 962 873 A1 



O 

Ll. 




32 



11/28/2003, EAST Version: 1.4.1 



EP0 962S73A1 




33 



11/28/2003, EAST version: 1.4.1 



EP0 962 873A1 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 98 10 9952 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation o( document with indication, where appropriate. 
of relevant passages 



Reievant 
to claim 



CLASSIFICATION OF THE 
APPLICATION <lnt.CI.6) 



EP 0 689 147 A (CANON KABUSHIKI KAISHA) 
27 December 1995 

* the whole document * 

R. V. GUHA ET AL: "Enabling Agents to 

Work Together" 

COMMUNICATIONS OF THE ACM, 

vol. 37, no. 7, July 1994, pages 127-142, 

XP000485267 

New York, US 

* the whole document * 

M, CHUNG ET AL: "Applying Parallel 
Processing to Natural -Language Processing" 
IEEE EXPERT, 

vol. 9, no. 1, February 1994, pages 36-44, 

XP000447480 

Los Alamitos, CA, US 

* the whole document * 



1-55 



G06F17/28 
G06F17/27 



TECHNICAL FIELDS 
SEARCHED OnLCI.6) 



G06F 



The preseni search report has been drawn up for all claims 



BERLIN 



Data of connptatton of th» Match 

3 November 1998 



Examtnsr 

Abram, R 



CATEGORY OP CITED DOCUMENTS 

X : particularly rol«vant if taken alona 

Y : particularly rotovant If combined wth another 

document of the same category 
A : technological background 
0 : non-written dlscloeure 
P ; Inlernnedate document 



T : theory or principle undertying the invention 
E : earlier patent document but published on, or 

after the filing date 
D : document cited in the application 
L : document dted for other reasone 

& : member of the eame patent tamlly, corresponding 
document 



34 



11/28/2003, EAST Version: 1.4.1 



